Skip to content

[Bug] Fix access to GPU by adding gpus all on warmup and launch warmup#36

Open
SangaraSorama wants to merge 2 commits into
elephant-track:mainfrom
SangaraSorama:gpu-fix
Open

[Bug] Fix access to GPU by adding gpus all on warmup and launch warmup#36
SangaraSorama wants to merge 2 commits into
elephant-track:mainfrom
SangaraSorama:gpu-fix

Conversation

@SangaraSorama
Copy link
Copy Markdown

@SangaraSorama SangaraSorama commented Mar 18, 2025

What are the changes the user will see?

The GPU access should not fail anymore, regardless of the GPU type the user has.

Why am I making these changes?

Currently, a lot of GPUs are not detected when trying to access them to use ELEPHANT once the container works.

Example of issue reported : https://forum.image.sc/t/issues-with-gpu-on-local-elephant-server/91424

This pull request should fix this issue.

What are the changes from a developer perspective?

Changes the Makefile :

  • Adds --gpus all in launch: warmup
  • Changes the structure of warmup to default back to --gpus all if no GPU is originally found

How to test the changes?

Try to install ELEPHANT in 2 contexts :

  • on a computer with a GPU that was properly accessible before this fix (example: NVIDIA RTX A6000)
  • on a computer with a GPU that was not accessible before this fix (example: NVIDIA RTX 4070)

In both cases, the GPU should be accessed properly.

We only tested on a computer with a GPU that was not accessible before the fix (NVIDIA GeForce RTX 4070 Laptop GPU)

@SangaraSorama SangaraSorama changed the title [Bug] Fix access to GPU by adding --gpus all on launch warmup [Bug] Fix access to GPU by adding gpus all on warmup and launch warmup Mar 30, 2025
@CamilleAstrid
Copy link
Copy Markdown

GPU Detection Fix

When running training with Elephant on the dataset, the process may default to the CPU. To ensure the training runs on the GPU, the following changes were made to the Makefile in the Docker/elephant-server directory.

1. Force GPU usage in the launch target

In Docker/elephant-server/Makefile, within the launch target:

Before :

$(ELEPHANT_DOCKER) run -it --rm $(GPU_ARG) --shm-size=8g -v $(ELEPHANT_WORKSPACE):/workspace \

After :

$(ELEPHANT_DOCKER) run -it --rm $(GPU_ARG) --gpus all --shm-size=8g -v $(ELEPHANT_WORKSPACE):/workspace \

This change ensures that the container explicitly requests access to all GPUs.

2. Improve GPU detection logic in the warmup target

In Docker/elephant-server/Makefile, the warmup rule was updated to define a fallback behavior when ELEPHANT_GPU is not set.

Before :

warmup:
	$(eval GPU_ARG:=$(shell \
	if [ -n "$(ELEPHANT_NVIDIA_GID)" ] && [ -n "$(ELEPHANT_GPU)" ]; then \
		VAR=$$(echo --gpus '"device=$(ELEPHANT_GPU)"'); \
	fi;\
	echo $$VAR))
	@if [ -n "$(GPU_ARG)" ]; then \
		$(ELEPHANT_DOCKER) run -it --rm $(GPU_ARG) $(ELEPHANT_IMAGE_NAME) echo "warming up GPU..."; \
	else \
		echo "CPU mode..."; \
	fi

After :

warmup:
	$(eval GPU_ARG:=$(shell \
	if [ -n "$(ELEPHANT_NVIDIA_GID)" ] && [ -n "$(ELEPHANT_GPU)" ]; then \
		VAR="--gpus device=$(ELEPHANT_GPU)"; \
	else \
		VAR="--gpus all"; \
	fi; \
	echo $$VAR))

This ensures that when a specific GPU is not defined, the Docker container still uses all available GPUs instead of falling back to CPU execution.

Note

The updated Makefile is provided as an attachment in .tar.gz format.
Makefile.tar.gz

Contributors: Jan Amouroux, Célia Brahimi, and Camille-Astrid Rodrigues
Master's students in Bioinformatics and Systems Biology at the University of Toulouse
contributed to this pull request as part of their supervised research project within the DEMO team (MCD laboratory, CBI-Toulouse).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants